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Abstract 

Sensitivity indices when the inputs of a model are not independent are estimated by local polynomial techniques. Two original 
estimators based on local polynomial smoothers are proposed. Both have good theoretical properties which are exhibited and 
also illustrated through analytical examples. They are used to carry out a sensitivity analysis on a real case of a kinetic model 
with correlated parameters. 
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' Achieving better knowledge of refining processes usually re- 
^^uires to build a kinetic model predicting the output compo- 
ne'nts produced in a unit given the input components intro- 
duced (the "feed" ) and the operating conditions. Such a model 
based on the choice of a reaction mechanism depending on 
'-carious parameters (e. g. kinetic constants). But the complexity 
o f the mechanism, the variability of the behavior of catalysts 
^jvhen they are used and the difficulty of observations and exper- 
■^iments imply that most often these parameters cannot be in- 
i^^srred from theoretical considerations and need to be estimated 
'nhrough practical experiments. This estimation procedure leads 
o consider them uncertain and this uncertainty spreads on the 
Cifaodel predictions. This can be highly problematic in real sit- 

§.ations. ft is then essential to quantify this uncertainty and 
o' study the influence of parameters variations on the model 
• eutputs through uncertainty and sensitivity analysis. 

• , During the last decades much effort in mathematical analy- 
>^is of physical processes has focused on modeling and reasoning 
jjdth uncertainty and sensitivity. Model calibration and vali- 
dation are examples where sensitivity and uncertainty analysis 
have become essential investigative scientific tools. Roughly 
speaking, uncertainty analysis refers to the inherent variations 
of a model (e.g. a modeled physical process) and is helpful 
in finding the relation between some variability or probability 
distribution on input parameters and the variability and prob- 
ability distribution of outputs, while sensitivity analysis inves- 
tigates the effects of varying a model input on the outputs and 
ascertains how much a model depends on each or some of its 
inputs. 

Over the years several mathematical and computer-assisted 
methods have been developed to carry out global sensitivity 
analysis and the reader may refer to the book of Saltelli, Chan 
& Scott (2000) for a wide and thorough review. Amongst these 
methods a particular popular class is the one composed by 
"variance-based" methods which is detailed below. Let us con- 



sider a mathematical model given by 

Y = r?(X) 



(1) 



where 77 : M. d — > M is the modeling function, Y £ R represents 
the output or prediction of the model and X = (Xi, JCj) is 
the d-dimensional real vector of the input factors or parame- 
ters. The vector of input parameters is treated as a random 
vector, which implies that the output is also a random variable. 
In variance-based methods, we are interested in explaining the 
variance Var(F) through the variations of the X$, i = l,...,d 
and we decompose Var(F) as follows : 

Var(F) = Var(E(Y"|A l )) + E(Var(y|X;)) 

for all i — l,...,d where E(Y|JQ) and Var(Y"|X;) are respec- 
tively the conditional expectation and variance of Y given Xi. 
The importance of Xj on the variance of Y is linked to how 
well E(y|X) fits Y and can then be measured by the first or- 
der sensitivity index 



Si = 



Var(E(y|X)) 
Var(F) 



also called correlation ratio. We can also introduce sensitivity 
indices of higher orders to take into account input interactions. 
For example, the second order sensitivity index for Xi and Xj 
is 



Si 



Var(E(y \Xj, Xj)) - Var(E(F |X)) - Var(E(F \Xj)) 
Var(F) ' 



and so on for other orders, see Saltelli et al. (2000) for details. 

In the case of independent inputs, two techniques, Sobol 
(SoboF 1993) and FAST (Cukier, Fortuin, Shuler, Petschek & 
Schaibly 1973) are the most popular methods for estimating 
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the Si indices. Although powerful and computationally effi- 
cient, these methods rely on the assumption of independent 
inputs which is hard to hold in many practical cases for ki- 
netic models. Nevertheless, three original methods, originated 
by Ratto, Tarantola & Saltern (2001), Jacques, Lavergne & De- 
victor (2004) and Oakley & O'Hagan (2004), try to deal with 
this problem. The first one sets out to calculate the sensitivity 
indices by using a replicated latin hypercube sampling, but this 
approach requires a large amount of model evaluations to reach 
an acceptable precision. The second one is based on the idea 
of building new sensitivity indices which generalize the orig- 
inal ones by taking into account block of correlations among 
the inputs. This method is however useless when many input 
factors are correlated. The last approach is that of Oakley & 
O'Hagan (2004) and rely upon the idea of approximating the 
function r\ in model ([1]) by a so-called 'kriging' response surface 
(Santner, Williams & Notz 2003) and of computing analyti- 
cal expressions of the sensitivity indices based on the results of 
the kriging approximation. However appealing and accurate, 
these analytical expressions involve multidimensional integrals 
that are only tractable when the conditional densities of the 
input factors are known and easy to integrate. If this is not 
the case the multidimensional integrals must be approximated 
numerically, but at high computational cost. We then propose 
a new way of estimating sensitivity indices through an interme- 
diate technique in the sense that it is based on a sample from 
the joint density of the inputs and the output like Ratto et al. 
(2001) but also on a nonparametric regression model like Oak- 
ley & O'Hagan (2004). This approach does not require as many 
model evaluations as Ratto et al. (2001) and does not require to 
approximate multidimensional integrals as Oakley & O'Hagan 
(2004) in the general case. 

In this paper to deal with correlated inputs we consider a 
new method based on local polynomial approximations for con- 
ditional moments (see the work of Fan & Gijbels (1996) and 
Fan, Gijbels, Hu & Huang (1996) on conditional expectation 
and the papers of Fan & Yao (1998) and Ruppert, Wand, Hoist 
& H'ssjer (1997) on the conditional variance). Given the form 
of the sensitivity indices, local polynomial regression can be 
used to estimate them. This approach not only allows to com- 
pute a sensitivity index through an easy black-box procedure 
but also reaches a good precision. 

The paper is organized as follows. In Section 1 we review 
the methods of Ratto et al. (2001), Jacques et al. (2004) and 
Oakley & O'Hagan (2004) and discuss their merits and draw- 
backs. In Section 2 we propose and study two new estimators 
for sensitivity indices relying on local polynomial methods. In 
Section 3 we present both analytical and practical examples. 
In Section 4 we finally give some conclusions and directions for 
future research. 

I. MODELS WITH CORRELATED INPUTS 

When the inputs are independent, Sobol showed that the sum 
of the sensitivity indices of all orders is equal to 1 , due to an or- 
thogonal decomposition of the function ij (Sobol' 1993). Indeed 
sensitivity indices naturally arise from this functional ANOVA 
decomposition. Nevertheless, when the inputs are correlated, 
this property does not hold anymore because such a decompo- 



sition can not be done without taking into account the joint 
distribution of the inputs. If one decides to estimate sensitivity 
indices under the independence hypothesis although it does not 
hold, results and consequently interpretation can be higly mis- 
leading, see the first example of Section 3.1. But we can still 
consider the initial ANOVA decomposition and work with the 
original sensitivity indices without ignoring the correlation, and 
when quantifying the first order sensitivity index of a particular 
input factor a part of the sensitivity of all the other input fac- 
tors correlated with it is also taken into account. Thus the same 
information is considered several times. Interpretation of sen- 
sitivity indices when the inputs are not independent becomes 
problematic. However, the input factors being independent or 
not, the first-order sensitivity index still points out which factor 
(if fixed) will mostly reduce the variance of the output. Thus, 
if the goal of the practitioner is to conduct a 'Factors Priori- 
tisation' (Saltelli, Tarantola, Campolongo & Ratto 2004), i.e. 
identifying the factor that one should fix to achieve the greatest 
reduction in the uncertainty of the output, first-order sensitivity 
indices remain the measure of importance to study, see Saltelli 
et al. (2004). Considering that this goal is common for prac- 
titioners, being able to compute first-order sensitivity indices 
when the inputs are no longer independent is an interesting 
challenge. 

Beyond this problem of interpretation, correlation also 
makes the computational methods FAST and Sobol unusable 
as they have been designed for the independent case. To get 
over these difficulties, it is first possible to build 'new' sensitivity 
indices that would generalize the original ones and match their 
properties, allowing interpretation. This is the idea of multidi- 
mensional sensitivity analysis of Jacques (Jacques et al. 2004) 
detailed in the next section. Secondly, Ratto et al. (2001) tried 
to continue on working with the original sensitivity indices and 
to compute them as described in Section 1.3, even if they do not 
give clues for interpretation. The authors generate replicated 
latin hypercube samples to approximate conditional densities. 
Finally, Oakley & O'Hagan (2004) suggest to approach the 
function -q in model ([I]) by a kriging response surface which al- 
lows to get analytical expressions of sensitivity indices through 
multidimensional integrals. 

1.1 Multidimensional Sensitivity Analysis 

To define multidimensional sensitivity indices, Jacques et al. 
(2004) suggest to split X into p vectors Uj, j = l f ...,p, each 
of size kj such that TX, is independent from U; for 1 < j, I < p, 
3+1--' 

X = (Xi, ...,Xd) — ( X 1: X kl , X kl+1 , x kl+k2 ^ , ... 

Ui u 2 

Afci + fc2 + ...+fcp-i + l; A fel+fc2 + ... + fep ) 



where k\ + k 2 + .... + k p = d. For example, if X = (X\, X 2l X 3 ) 
where X\ is independent of X 2 and X 3 but X 2 and X 3 are 
correlated, we set Ui = X x and U 2 = (X 2 , X 3 ). 

Thus they build first order multidimensional sensitivity in- 
dices using the Uj vectors : 
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Var(E(y|U J -)) 
1 " Var(Y) 

Var(E(y|X fcl+fc2+ „, +fcj ._ 1+ i,...,X fcl+ fc 2+ .„ +fcj )) 
Var(Y) 

for j — l,...,p. Remark that if the inputs are independent, 
these sensitivity indices have the same expression as in classi- 
cal sensitivity analysis. Finally, it is possible to compute these 
indices by Monte-Carlo estimations. 

Nevertheless, this method has a main drawback hard to 
overcome. If all the inputs are correlated, the Uj vectors cannot 
be defined (except the trivial case Ui = X) and interpretation 
is not possible. The problem remains the same if too many 
inputs are dependent because this situation leads to consider 
very few multidimensional indices. Moreover, identifying a set 
of correlated variables Uj with high sensitivity index does not 
allow to point up whether this is due to one particular input of 
the set as we cannot differentiate among them. We will illus- 
trate this phenomenon in the second example of Section 3.1. 

1.2 Correlation-Ratios With Known Conditional Density 
Functions 

The estimator introduced by Ratto et al. (2001) was first dis- 
cussed in McKay (1996) and is based on samples from the con- 
ditional density functions of Y given X i} i = 1, d. 

Let (X J )j = i ; ... ; „ be an i.i.d sample of size n from the distri- 
bution of the vector X. (-^/ )j=i,...,n 1S then an i.i.d. sample of 
size n from the distribution of the input factor JQ. For each 
realization Xf of this sample, let (Y/' fe )fc=i r be an i.i.d. sam- 
ple of size r from the conditional density function of Y given 
Xi = X\ and define the sample means 

k=i j=i 

Note that Y\ and -X^=i(^? ~ Y\) 2 respectively estimate 
the conditional expectation E(Y|Xj = X?) and the conditional 
variance Var(Y\Xi = Xf), while estimates E(Y). 

Using these moments estimators the numerator of the first 
order sensitivity index Si, Var(E(Y|Xj)), can be estimated by 
the empirical estimator 

3=1 

Similarly the denominator of Si, Var(Y), is estimated by 

U j=l T fc=l 

The estimator of the first order sensitivity index Si of the input 
factor Xi , i = 1 , . . . , d is then defined as 

a _ SSB 
2 SST 

where 



SSB = rJ2( Y i- Y i) 2 

3=1 

and 

n r 

SST = Y,^ Y i k -Yif- 

3=1 fe=l 

To compute these indices and to generate the samples needed, 
Ratto uses two different methods : pure Monte-Carlo sampling 
and a single replicated Latin HyperCube (r-LHS) sampling. 

It is crucial to note, however, that these two methods require 
a huge amount of model evaluations to reach a good precision 
and can only be used for cases where model runs have very low 
computational cost. 

1.3 Bayesian Sensitivity Analysis 

The idea of Oakley & O'Hagan (2004) is to see the function 
as an unknown smooth function and to formulate a prior dis- 
tribution for it. More precisely, it is modeled as the realization 
of a Gaussian stationary random field with given mean and co- 
variance functions. Then, given a set of of values yi = ^(xj), we 
can derive the posterior distribution of jj(-) by classical Bayesian 
considerations. The prior distribution of r/(x) is a Gaussian sta- 
tionary field : 

»y(x) = h(x)*/3 + Z(x) 

conditionally on (3 and cr 2 , where h(-) is a vector of q known 
regression functions and Z(X.) is a Gaussian stationary random 
field with zero mean and covariance function cr 2 c(x, x'). The 
vector h(-) and the correlation function c(-, •) are to be chosen 
in order to incorporate some information about how the output 
responds to the inputs and about the amount of smoothness 
we require on the output respectively. We refer the reader to 
Santner ct al. (2003) and to Kennedy & O'Hagan (2001) for 
a detailed discussion on these choices. The second stage prior 
concerns the conjugate prior form for (3 and cr 2 , which is chosen 
to be a normal inverse gamma distribution. Now assuming we 
observe a set y of n values of j/j = r/(xi), we can derive that 
the posterior distribution of rj(-) given these data is a Student 
distribution, see Oakley & O'Hagan (2004) for details. 
Using this posterior distribution, sensitivity indices can be com- 
puted analytically through multidimensional integrals involving 
functions of the observations and the conditional distributions 
of the input factors only. The main advantage of this Bayesian 
approach is that the model is only evaluated to calculate the 
quantities above, i.e. to 'fit' the response surface. Once this 
is done the estimation of sensitivity indices just involves the 
conditional distributions of the input factors. When the num- 
ber of model runs is fixed, this method clearly reduces the 
standard errors of the estimated sensitivity indices obtained by 
Monte-Carlo methods such as Sobol (when the input factors 
are independent) and can still be used when the input factors 
are not independent. 

However, the multidimensional integrals leading to the com- 
putation of the sensitivity indices, if not tractable analytically, 
need to be estimated. Let us describe more particularly one 
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of the estimators proposed in Oakley & O'Hagan (2004). We 
keep the authors notations and denote by E* the expectations 
defined with respect to the posterior distribution of rj(-). The 
numerator of the first-order sensitivity index of Y with respect 
to X\ is estimated by 

E*(Var(E(F|Xi))) = E*(E(E(T |Xi) 2 )) - E*(E(Y~) 2 ) 

and one of the quantities involved in the computation of 
E*(E(E(Y|Xi) 2 )) is for example 

Ui = [ f [ c(x,x*) dJUntx-xln) 

cLF_i| 1 (x / _i|i 1 ) dFi(si) 

where is the marginal distribution of X_i (subvector 

of X containing all elements except X%) given X\, F\ is the 
marginal distribution of X\ and x* denotes the vector with ele- 
ments made up of xi and x^_ 1 in the same way as x is composed 
of xi and x_i. If the conditional distribution F-im is not an- 
alytically known, we first need to estimate it with a sample of 
the joint distribution F. Many methods have been developed to 
do so, let us just mention for example kernel techniques. But in 
general in high dimension the data is very sparsely distributed 
and it is difficult to get an accurate approximation of condi- 
tional distributions since the so-called curse of dimensionality 
arises. For instance the best possible MSE rate with kernel 
techniques is n~ 4 /' 4+d ) which becomes worse as d gets larger. 
Moreover, even if we could get a good approximation of -FIim, 
still remains the problem of evaluating the multidimensional 
integrals. Indeed the dimensionality of these integrals can 
reach 2d — 1 as in the expression of U\ above. Since these 
integrals can not in general be separated into unidimensional 
integrals, approximating them with a sufficent accuracy is not 
an obvious mathematical problem. Deterministic schemes can 
not reasonably be considered, and with Monte-Carlo or quasi 
Monte-Carlo sampling (Owen 2005) thousands (or millions) of 
draws are required to get a reasonable accuracy. 

With unknown densities, even if conceptually, sampling 
rather than analytical integration in the Oakley and O'Hagan 
approach seems reasonable, the results could be highly af- 
fected by the curse of dimensionality. Let us mention that Pr. 
O'Hagan has public domain software carrying out this analysis. 
However it does not yet allow to consider dependent inputs. 

2. NEW ESTIMATION METHODOLOGY 

Our approach is to estimate the conditional moments E(Y|Xj = 
Xf) and Var(Y\Xi — Xf) with an intermediate method be- 
tween the one of Ratto et al. (2001) and Oakley & O'Hagan 
(2004). We first use a sample (Xi,Yi) to estimate the condi- 
tional moments with nonparametric tools (provided they are 
smooth functions of the input factors). Then, we compute 
sensitivity indices by using another sample of the input factors 
only (and thus no more model runs are needed). While Oakley 
& O'Hagan (2004) approximate the function 7j(X) in M. d , we 
approximate it marginally, i.e. we approximate the conditional 
expectations E(ry(X)|X;) in R. This approach allows to over- 
come the multidimensional integration problem of the Bayesian 



sensitivity analysis. 

To simplify the notations, until Section 2.4 (X,Y) will 
stand for a bivariate random vector (i.e. X is unidimen- 
sional). As the variance may be decomposed as Var(Y) = 
Vax(E(Y\X)) + E(Var(Y|X)), the index we wish to estimate 
can be written 



_ Var(E(Y|X)) E(Var(r|X)) 
Var(Y) 1 Var(F) ' 



(2) 



These expressions clearly give two ways of estimating S : the 
issue is to be able to estimate Var(E(Y|X)) or alternatively 
E(Var(Y"|A)), obviously by estimating first the conditional mo- 
ments E(Y\X = x) and Yai(Y\X = x) (i 6 1). In both cases 
the denominator term Var(Y) can be easily estimated. To ap- 
proximate the conditional moments, we propose to use local 
polynomial regression. This highly statistical efficient tool is 
easy to apprehend as it is close to the weighted least-squares 
approach in regression problems. Only basic results will be pre- 
sented here, for a detailed picture of the subject the interested 
reader is referred to Fan & Gijbels (1996). 



2.1 Formulation of the Estimators 

Let (Xi,Yi)i—i t ,,, tTl be a two-dimensional i.i.d. sample of a real 
random vector (X, Y) . Assuming that X and Y are square in- 
tegrable we may write an heteroskedastic regression model of 
Yi on Xi, exhibiting the conditional expectation and variance, 
as 

Yi = m(Xi) + a(Xi)eu i=l,...,n 

where m(x) = E(Y|X = x) and a 2 (x) = Var(Y|A = x) 
(x G M) are the conditional moments and the errors e%, . . . , e„ 
are independent random variables satisfying E(e;|X;) = and 
Var(ej|Aj) = 1. Usually and Xi are assumed to be inde- 
pendent although this is not the case in our work. Note that 
results for correlated errors have been recently developed (Vilar- 
Fernandez & Francisco-Fernandez (2002) for the autoregressive 
case for example). Local polynomial fitting consists in approx- 
imating locally the regression function m by a p-th order poly- 
nomial 

p 

m ( z ) ~^Pji z ~x) J 

3=0 

for z in a neighborhood of x. This polynomial is then fitted to 
the observations pQ,Yj) by solving the weighted least-squares 
problem 



2 V 

™ n £(V5>(*i-<) ^(^) (3) 

3=0 ' ' 



where K\(.) denotes a kernel function and hi is a smooth- 
ing parameter (or bandwidth). In this case, if (3(x) = 
(Po(x), ...,f3 p (x)) T denotes the minimizer of (|3|) we have 

m(x) = 0o(x), 

while the is-th derivative of m(x) is estimated via the relation 
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see Fan & Gijbels (1996) for more details. As it will be dis- 
cussed later, the smoothing parameter hi is chosen to balance 
bias and variance of the estimator. Finally, remark that the 
particular case p — (constant fit) leads to the well-known 
Nadaraya-Watson estimator rriNw(x) of the conditional expec- 
tation, given explicitly by 

m NW (x) = i-i , 

8=1 

see Wand & Jones (1994). 

Estimation of the conditional variance is less straightfor- 
ward. If the regression function to was known, the prob- 
lem of estimating cr 2 (.) would be regarded as a local poly- 
nomial regression of r 2 on X$ with r\ = (Yi — m(Xi)) 2 , as 
E(r 2 \X = x) = a 2 {x) with r 2 = (Y - m{X)) 2 . But in practice, 
to is unknown. A natural approach is to substitute m(.) by 
its estimate rh(.) defined as above and to get the the residual- 
based estimator d 2 {x) by solving as previously the weighted 
least-squares problem 

n q 2 Y 

min E(/*-E^-*) J ) ^Fp) (4) 

7 i=l j=l 2 

where rf = (Yi — rh(Xi)) 2 , -ft^G) is a kernel and hi a smoothing 
parameter. Note that the kernel -FGO) is not necessarily chosen 
to be equal to the kernel Ki(.). Then 

o- 2 (x) = jo(x) 

where j(x) = (70 (x), ...,j q (x)) is the minimizer of (|4]) . As pre- 
viously, the smoothing parameter hi has to be chosen to balance 
bias and variance of the estimator, see Fan & Yao (1998). 

Going back over the equalities in the last step is to es- 
timate the quantities Var(E(Y"|A)) and E(Var(Y"|A)) by using 
the local polynomial estimators for the conditional moments 
defined right above. To do this let us assume we have an- 
other i.i.d. sample (Xj)j = i,... in ' with same distribution as X. 
If the functions to(.) and o 2 (.) were known, we could estimate 
Var(E(Y~|A)) = Var(m(A)) and E(Var(Y~|A)) = E(cr 2 (A)) 
with the classical empirical moments 

1 n ' 2 1 

— J] mA)-m and 

1 

where to — — V^m(Xj). As to(.) and cr 2 (.) are unknown, the 
. n j=1 

main idea is to replace them by their local polynomial estima- 
tors which leads to consider 

"' 2 1 

f i = — x E - ™) and f * = -7 E ^) 
1 

where to = — to(Aj) and m(.) and <r 2 (.) arc the local poly- 
pi 

3=1 

nomial estimators of to(.) and <r 2 (.) introduced above. It is 



important to note that we need two samples, the first one 
(Xi, Yi)i=i,...,n to compute m(.) and <r 2 (.) and the second one 
(-X"j)i=ii t° fi na lly compute the empirical estimators 7\ and 

2.2 Bandwidth and Orders Selection 

The selection of the smoothing parameters h\ and /i 2 and to a 
lesser extent of the polynomials orders p and q can be crucial 
to get the least mean squared error (MSE) of the estimators 
Ti and T^. Classically the MSE consists of a bias term plus 
a variance term and so is minimized by finding a compromise 
between bias and variance. 

Concerning this choice, the reader is referred to Fan et al. 
(1996), Fan & Yao (1998) or Ruppert (1997). Most of the meth- 
ods suggested by these authors rely upon asymptotic arguments 
and their efficiency for finite sample cases is not clear. In prac- 
tice cross-validation methods can be used for the finite sample 
case (Jones, Marron & Sheather 1996), but in the examples 
of Section 3 we will use the empirical-bias bandwidth selector 
(EBBS) of Ruppert which appears to be efficient on simulated 
data. EBBS is based on estimating the MSE empirically and 
not with an asymptotic expression. The choice of the polyno- 
mials orders is more subjective. Concerning the estimation of 
the conditional expectation, Fan & Gijbels (1996) recommend 
to use a v + 1 or v + 3th-order polynomial to estimate the vth- 
derivative of m(x), following theoretical considerations on the 
asymptotic bias of rh{x) on the boundary. We would then be 
lead to take p = 1 or p = 3 to estimate the Oth-dcrivative m{x). 
But Ruppert, Wand & Carroll (2003) suggest that this conclu- 
sion should be balanced by simulation studies and stress that 
p = 2 often outperforms p — 1 and p = 3. The only common 
conclusion is that local linear regression (p = 1) is usually supe- 
rior to kernel regression (Nadaraya-Watson estimator obtained 
with p = 0). This is the reason why we will only consider and 
study local linear regression for m(x) in the next theoretical and 
practical sections. The choice is still difficult when estimating 
the conditional variance as we have to choose p and q simul- 
taneously. One more time, the authors are not unanimous : 
Fan & Yao (1998) recommend the case p = l,q = 1 whereas 
Ruppert et al. (1997) suggest p = 2, q = 1 or p = 3, q = 1. 
However on the simulations we have carried out, the choice of 
p = 1, q = 1 is adequate and satisfactory in terms of precision. 
This is the reason why we have decided to consider only the 
case p — l,q = 1 for both theoretical and practical results. 

2.3 Theoretical Properties of the Estimators 

The properties of T\ and T2 strongly depend on the asymp- 
totic results on the bias and variance of the local linear esti- 
mators to(.) and o" 2 (.). We only give here two main results, all 
assumptions (Aq, Ai, Bq, B4,, Co) and proofs are given in 
appendix for readability. Ex and Varx stand for the conditional 
expectation and variance given the predictors X = (Ai, X n ). 
The expression op(ip{h)) is equal to ip{h)op{l) for a given func- 
tion tp. Here op(l) is the standard notation for a sequence of 
random variables that converges to zero in probability. 

Theorem 1 Under assumptions (A0)-(A4) and (CO), the es- 
timator Ti is asymptotically unbiased. More precisely 
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Ex(fi) = Var(E(Y\X)) + M x h\ + ^+ o P {h\). 

where Mi and M 2 are constants given in appendix. 

Remark 1. It would be interesting to calculate the variance 
of this estimator, but it would require the expressions of the 
third and fourth moments of the local linear estimator m(.) 
(see the appendix). This is not an obvious problem and to the 
best of our knowledge it has not been addressed in the liter- 
ature. It is beyond the scope of the present paper but it is 
an interesting problem for future research. Nevertheless, the 
variance can be estimated on practical cases through bootstrap 
methods for example (Efron & Tibshirani 1994). 

Theorem 2 Under assumptions (B0)-(B4) and (CO), the es- 
timator T 2 is consistent. More precisely 

E x (f 2 ) = E(Var(Y\X)) + V x h\ + o P (h 2 + h 2 2 ) 



Varx(T 2 ) = 



1 

n 1 



E(Var(Y\X) 2 ) + V 2 h\ + V 3 h\ 



nh 2 



where V\, V 2 , V3 and V4 are constants given in appendix. 
2.4 Application to Sensitivity Analysis 

Let us come back to the model ([1]), where X is multidimen- 
sional. The goal is to get an estimate of Si for i = 1, . . . , d by 
using one of the two estimators T\ and T 2 ■ We need two samples 
to compute each of them, i.e. a sample (X k , Y )k=i,..., n to esti- 
mate rh{.) and <x 2 (.) and a sample {X\)i—x^ ...y to get T\ and T 2 
where (X k )k=i,...,n and {X\)i=i t ... in > are samples from the joint 
distribution of the d-dimensional input factors X = {Xiji—i^..^ 
and (Y )k=i,...,n a sample of the output Y. Note that the model 
is run just for the first sample and not for the second one. Three 
situations can arise : 

1. Sampling from the joint distribution of X has low 
computational cost and running the model to compute 
(Y k )k = i,..., n is cheap. This is the ideal situation. In- 
deed in this case the two samples (X k ,Y k )/, =1 ^ n and 
{X\)i = i^ ^ n i can be generated independently and be as 
large as required ; 

2. Sampling from the joint distribution of X has low com- 
putational cost but model evaluations have not. In this 
case (also pointed out by Oakley & O'Hagan (2004)) a 
moderate-sized sample {X k ,Y k )^ = i^ „ is used in order 
to fit the conditional moments. However to compute T\ 
and T 2 we can then use a sample (X|); = i ) ... >n ' of large 
size ; 

3. Sampling from the joint distribution of X has high compu- 
tational cost. This case can arise in practice for example 



when the input factors are obtained through a procedure 
based on experimental data and optimization routines. 
We then have an initial sample (XP)j=i,...,N of limited 
size N that we wish to use for the two steps of the esti- 
mation. The first idea is to split it and to use the first part 
to get the sample (X k , Y k )k=i.... t n and the second one to 
get (X-)i=i ! . ..,„'. The drawback of this method clearly 
arises if N is very small. Another way to tackle the prob- 
lem is to use the well-known leave-one-out idea procedure 
which gives better approximation than data splitting. 
As suggested by the Associate Editor another possible 
method could be to use the sample of size N to esti- 
mate the conditional moments and to estimate also the 
marginal densities of each input using for instance a non- 
parametric density estimator. One could then use these 
density estimates to get the sample (X-)/ = i, ...,„<. The 
clear disadvantage of this procedure is that it may bias 
the final estimators. Some simulation runs not reported 
here for lack of space show that such a procedure leads 
to less efficient estimates probably due to the large bias 
produced by nonparametric methods. 

The last situation obviously leads to the less accurate ap- 
proximations of first-order sensitivity indices. However in gen- 
eral, litterature and results on sensitivity analysis assume that, 
if not analytically known, the joint distribution of the input fac- 
tors can at least be generated at low computational cost. This 
is the reason why we will only describe here the procedure for 
estimating first-order sensitivity indices in case 1 or 2. We now 
assume that we have two samples (X k )k=i,..., n and (^)z=i,..., n ' 
obtained by one of the methods described right above. 



The estimation procedure for Si = 



mg : 



Var(E(r|A,)) 
Var(F) 



is the follow- 



Step 1 : Compute the output sample (Y k 
ning the model at (X. k )k=i,...,n 



lfe=i, 



by run- 



Step 2 : Compute &y, the classical unbiased estimator of 
the variance Var(F) 

fc=i 

Step 3 : Use the sample (X k , Y k ) k = lt ..^ n to obtain m{X\) 
for I — 1, ... ,n' and rh(X k ) for k = 1, . . . , n using the smooth- 
ing parameter hi given by EBBS 

Step 4 : Compute squared residuals fk = (Y h — m(X k )) 2 for 
k = 1, . . . , n and apply the smoothing parameter h 2 obtained 
by EBBS to compute a- 2 {X\) for / = 1, . . . , n' 

Step 5 : Compute T\ with rh(X\) for I — 1, ... ,n' from Step 
3 and compute T 2 with a 2 (X\) for / = 1, . . . , n' from Step 4 

Step 6 : The estimates of Si are then 



S 



W-_lL and s\ 2) = l-^. 



(5) 



'Y 



G 



To obtain all the first-order sensitivity indices, repeat the 
procedure from Step 3 to Step 6 for i = 1, d. 

Remark 2. Given the theoretical properties of T\ and T 2 
and more precisely their non-parametric convergence rate, we 
can also expect a nonparametric convergence rate for and 
S«. 

Remark 3. In practice, our simulations show that n of the 
order of 100 and n' around 2000 are enough for accurate esti- 
mation of the sensitivity indices. 



indicating that X\ should be the input to be fixed to reach the 
higher variance reduction on Y. But if one neglects the corre- 
lation, by computing for instance these indices with the FAST 
method, i.e. working with a three-dimensional normal vector 
with mean and covariance matrix / instead of T, one would 
estimate 

S? = 0.2907, S% = 0.2907, S% = 0.4186 



3. EXAMPLES 

In all the following examples we use the two estimators S^ 1 ' 
and defined in ([5]). As mentioned in Section 2.2, the condi- 
tional expectation is estimated here with local linear regression 
(p = 1) and the conditional variance with p = 1 and q = 1, 
the bandwidths being selected by the estimated-bias method of 
Ruppert (1997). 



3.1 Analytical Examples 

In this section, we carry out two different comparisons in order 
to study our two estimators from a numerical point of view. 
The first model has been chosen to underline their precision in 
correlated cases when FAST and Sobol methods are no longer 
efficient and when Jacques' approach for multidimensional sen- 
sitivity analysis is limited. We also show how interpretation 
with sensitivity indices obtained by neglecting correlation can 
be false. The second one is an example illustrating the perfor- 
mance of our estimators with respect to the method of Oakley 
and O'Hagan in a two-dimensional setting. 

In the first analytical example, we study the model 

Y = Xi + X 2 + A 3 

where (Ai,A2, A3) is a three-dimensional normal vector with 
mean and covariance matrix 



1 
1 pa 
pa a 2 



where S° stands for the sensitivity indices when p = 0. These 
results indicate that A3 should be fixed to mostly reduce the 
variance of Y, which is absolutely wrong as the calculations 
above have shown. This simple example highlights the dan- 
ger of neglecting the correlations between the inputs and the 
importance to take them into consideration when computing 
sensitivity indices. 

Otherwise, applying Jacques' idea to Ai and the couple 
(A2, A3), we also get the expression of the first order multi- 
dimensional sensitivity index 



'{2,3} 



1 + a 2 + 2pa 

2 + a 2 + 2pa 



Choosing p = —0.2 and a = 0.4, we have 



Si = £{2,3} = 0.5, S 2 = 0.4232, S3 = 0.02 



If we interpret these indices as suggested by Jacques' multidi- 
mensional sensitivity analysis, the only conclusion we can give 
is that the couple (X 2 ,Xs) has the same importance as Xi. 
Indeed S{ 2 ,3} = Si. But actually the high value of S{ 2 ,3} comes 
from A2 as shown by the exact calculations above, which im- 
plies that the information on S{ 2 ,3} alone is not sufficient. But 
with our method, we can estimate all the first order sensitivity 
indices : 



where p is the correlation of A2 and A3 and a > is the stan- 
dard deviation of A3. The first order sensitivity indices can be 
evaluated analytically : 



SP = 0.4895, 



0.4250, S { 3 1] = 0.0234 



Si 



S 2 



S3 = 



1 



2pa 



+ 2pa 
P? 



2 + a 2 + 2pa 

The first crucial remark to be done in this case is that we must 
take into account correlations to estimate sensitivity indices if 
we want a serious investigation of this model. Indeed, let us 
consider the case where a =1.2 and p = —0.8. We then have 

Si = 0.6579, S 2 = 0.0011, S 3 = 0.1053, 



S\ 



(2) 



0.5081, S. 



(2) 



0.4368, 



Sf ) = 0.0361 



for an average upon 100 simulations with n = 50 and n' = 1000. 
We display in Figure 1 the boxplots corresponding to the distri- 
bution of the sensitivity indices on these 100 simulations with 
the estimator T 2 . Because of the mathematical complexity men- 
tioned before for the computation of the variance of Tj , we are 
not able to recommend one estimator over the other one from 
a theoretical point of view. But in practice, we have observed 
that the variance of T 2 is at least comparable to the variance 
of Ti, and sometimes lower. Nervertheless, the computation of 
T 2 is more difficult as illustrated in Section 2.4. 
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and O'Hagan and to estimate the conditional moments in our 
method. Then, we calculated analytically the multidimensional 
integrals in the Bayesian approach while using a sample of size 



5000 to compute S\ 
to 

Si 



12) 



for i = 1, 2. The Bayesian approach leads 



Input factor 

Figure 1. Boxplot of the estimated sensitivity indices (S^ ) for the 
three-factor additive model, 100 simulations. Dot lines are the true 
values. 



0.9038 and S 2 = 0.0961 
while the local polynomial technique gives 

S[ 2) = 0.9127 and S { 2 2) = 0.0452. 

We can see on this example that the results obtained with both 
methods are comparable. However on this simple case the mul- 
tidimensional integrals were analytically computed, which could 
not be the case in a non-independent setting. If not, a numerical 
integration, if feasible, would lead to less accurate approxima- 
tions as discussed in Section 1.3. 

3.2 Practical Example from Chemical Field : Isomerization of 
the Normal Butane 



Computing 5*2 and S3 with our method, even if both of The isomerization of the normal butane, i.e. molecules with 



them take into account correlations, allows to confirm the ex- 
pected result : all the variability comes from X2, and not from 
X3. This simple example then brings out the limitation of the 
multidimensional approach. 

In the second analytical example we consider the model 
Y = 0.2exp(Xi-3) + 2.2|X 2 | + 1.3X 2 6 -2X 2 2 
-0.5X 2 4 - 0.5Xf + 2.bXf + 0.7Xf 

+ (8^-2)' + (5*2-3)* + ! + C0S ( 3 ^) 

where Xi and X 2 are independent random variables uniformly 
distributed on [—1,1]. Such a model is routinely used at Insti- 
tut Francais du Petrole to compare different response surface 
methodologies as it presents a peak and valleys. The function 
is plotted in Figure 2. 




four carbon atoms, is a chemical process aiming at transforming 
normal butane (nC4) into iso-butane (iC4) in order to obtain 
a higher octane number, favored by iC. A simplified reaction 
mechanism has been used : 



nC4 



iC4 



2 iC4 — ► C3- 
nC4 + iC4 — 



-C5 

C3+C5 



(1) 

(3) 
(4) 



where reaction (1) is the main reversible reaction converting the 
normal butane into iso-butane. Reactions (3) and (4) are sec- 
ondary and irreversible reactions which produce propane (C3) 
and a lump of normal and iso-pentane (C5), paraffins with three 
and five carbon atoms. The model linked to this process can 
be written as 

Y = v (c,9) 

where 

- Y is the 3-dimensional result vector (mole fractions of the 
components nC4, iC4, C3 and C5 ; note that their sum is 1), 

- c is the vector containing the operating conditions (pres- 
sure, temperature,...) and the mole fraction of the input com- 
ponents (nC4 and iC4, this is called the feed), 

- 6 = (0*)i=i,...,8 is the 8-dimensional random vector of the 
parameters of the reactions (pre-exponential factors, activation 
energies, adsorption constants,...), 



Figure 2. Function proposed in model 2 . 
In this case, the sensitivity indices are 

Si = 0.9375 and S 2 = 0.0625 

We considered a 6 x 6 regular grid on [0,1] 2 and used it 
to estimate the posterior distribution in the method of Oakley 



- / is the function modeling the chemical reactor in which 
the reaction takes place. It is evaluated through the resolution 
of an ordinary differential equations system which can not be 
analytically solved and is calculated numerically. 

The first step here is to get the distribution of which is 
unknown. However, it is possible to use the experience and the 
knowledge of chemical engineers to suggest a reasonable approx- 
imation of this distribution. Classically, we assume that 6 has 
a multivariate Gaussian distribution with mean zero (once the 
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parameters are centered). Concerning the correlation matrix, it 
is built with experts and with the help of bootstrap simulations 
and is given by : 



r = 
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In order to compute sensitivity indices, we generate a sample 
of size n = 5000 from this distribution. 

Here we wish to estimate, for a given operating conditions and 
feed vector c, the sensitivity indices of the outputs with respect 
to the input factors in 0, i.e. 

■_ Var(E(Y^)) 

Var(lS-) 

for j = 1, 3 and i = 1, 8. Actually, our goal is to identify 
on which factor we should make the effort of reducing the un- 
certainty, by carrying out new experiments. This factor should 
be chosen in order to reduce as much as possible the uncertainty 
of the outputs. 

We consider two particular vectors c\ and C2 containing the 
same operating conditions but a different feed (ci : nC4=l 
and iC4=0, c 2 : iC4=l and nC4=0). We have drawn for each 
vector Ci, i = 1,2 a sample of size n from Y by Monte-Carlo 
simulations, i.e. by computing Yj = r](ci,9j) for j = l,...,n. 
Thus we have a sample from (Y, 6) for each particular ci and 
C2- For instance, the estimates of the sensitivity indices of the 
third output C3+C5 with the T\ estimator are given in Figure 
4. Filled bars correspond to c\ and empty bars to c 2 . 

Note that the estimates given by the T 2 estimator are simi- 
lar. These results highlight the behavior of the C3+C5 output 
when the feed changes. Indeed when we only use nC4 in the 
feed (ci) the production of C3+C5 is mainly linked to the 
production of iC4 by reaction (1). This is confirmed by the 
importance of parameters 1 and 6 in Figure 4 which are the 
parameters involved in reaction (1). When the feed only con- 
tains iC4 (02), the first reaction is no longer dominating for 
the production of C3+C5, now mainly linked to reaction (3). 
Parameters 4 and 2 that are the most important in Figure 4 
for C2 are connected to reaction (3). We can thus conclude that 
the results confirm the expected behavior of the C3+C5 output. 



0.5 - 
0.4 - 

o 

■5 0.3 - 



0.2 - 




Figure 4- Sensitivity indices of the C3+C5 output in the isomer- 
ization model for the particular conditions ci (filled bars) and C2 
(empty bars). 

We could obviously study the sensitivity indices for the other 
outputs, and for other operating conditions. Such a study has 
been carried out and showed that the most influent parameters 
depend on the operating conditions and the feed. But it also 
underlined that each parameter of the model has an influence 
on at least one output for at least one operating condition. In 
this case these sensitivity indices estimates enlighten the fact 
that all the parameters are potentially important. A discussion 
with chemical engineers would then be necessary in order to 
identify which outputs are most critical for their goals (control- 
ling for instance the first output iC4 which is strongly linked to 
the octane rate) and would thus help us to choose which input 
parameters deserve most attention. 

4. DISCUSSION AND CONCLUSION 

The estimation method proposed in this paper is an efficient 
way to carry out sensitivity analysis by computing first order 
sensitivity indices when the inputs are not independent. The 
use of local polynomial estimators is the key point of the es- 
timation procedure. It guarantees some interesting theoretical 
properties and ensures good qualities to the estimators we have 
introduced. Beyond these theoretical results, practical exam- 
ples also show a good precision for a rather low computation 
time. Obviously, higher precision requires higher calculation 
time and the user has the possibility to adapt the estimators, 
by fixing some hyper-parameter values such as polynomials or- 
ders. 

The main advantage of our estimators is obviously that they 
only make the assumption that the marginals are smooth and 
then require less model runs than classical sampling methods. 
Comparing with the Bayesian approach of Oakley & O'Hagan 
(2004), our method has the same philosophy as it uses model 
runs to fit a response surface under smoothness assumptions, 
but we avoid its numerical integration issue in high dimension. 
Moreover our approach is appealing for practioners in the sense 
that they can see it as a black-box routine, as each step of the 
procedure is data-driven once the user has given the two sam- 
ples needed for the estimation. 

Finally, we think that a practitioner willing to carry out a 
sensitivity analysis should combine different approach to get 
the most accurate result, for example computing the indices 
with the method we introduce her and the one of Oakley and 
O'Hagan. Indeed these two methods are not concurrent but 
complementary. 

Future work will also be based on building multi-outputs 
sensitivity indices through multivariate nonparametric regres- 
sion techniques. 
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APPENDIX : PROOFS OF THEOREMS analysis are bounded and so have densities with compact sup- 

port. 



A.l Assumptions 



A. 2 Proof of Theorem 1 



We list below all the assumptions we use in the development 
of our proofs. Note that the bandwidths hi and h 2 are by This theorem is a direct consequence of the asymptotic behavior 
definition positive real numbers. of the bias and variance in local linear regression. 

Under assumptions (A0)-(A4), Fan et al. (1996) established 
(AO) As n -> oo, hi -> and nh x -> oo ; that for a given kernel K{.) 

(Al) The kernel K (.) is a bounded symmetric and continu- Ex(m(x)) = m(x) + -^fh{x)h\ + o P (hf) (6) 



ous density function with finite 7 moment 



T 

and 



(A2) fx{x) > and ,f x (.) is bounded in a neighborhood of Var x (m(a;)) = — h o P {h\) (7) 

j x \xjTiri\ 



x where fx(-) denotes the marginal density function of X ; 
(A3) rh{.) exists and is continuous in a neighborhood of x ; 



where fif. — J u k K{u)du and Vk ~ J u k K 2 (u)du. Now as the 



(BO) As n — > oo, hi — ► and liminf nhf > for i = 1, 2 ; 



estimator Ti is 

(A4) ct 2 (.) has a bounded third derivative in a neighborhood „' 
of x and m(x) ^ ; ^ = _L_ (m(X,) - m) ^ 



we can write 



1 ™ 

(Bl) The kernel K(.) is a symmetric density function with _ — Z) 2 

a bounded support in R. Further, |Jf {x\) — K [x^ <c\x\ — x 2 \ n ' ~ ^ j=i 

for xi , X2 € R ; , 

1 r 

(B2) The marginal density function satisfies > ^'^' =1 '-' n ' :_ M-*j))i=i,...,n' and Z ~ —, Z J - B ^ 

and |/x(afi) - fx(x 2 )\ < c\xi — x%\ for X\,x% G R ; conditioning on the predictors X, the sample (^|X) i=1) ... )n , is 

an i.i.d. sample distributed as Zi|X and the conditional bias of 
(B3) E(Y ) < oo ; can then be obtained through the classical formula for the 

empirical estimator of the variance : 

(B4) a 2 (x) > and the function E(Y k \X = .) is continuous 
at x for k = 3,4. Further, m(.) and a 2 (.) are uniformly con- E x (7\) = Var x (Zi) = E X (Z 2 ) - E x (Zi) 2 . 

tinuous on an open set containing the point x ; AT , , .. . 

r oii Note that we can also compute its variance 

(CO) has compact support [a, 6] # = 1 / ^ _ Ex(Zi))4) _ ^3 (Varx( ^ )} 2 

n \ n — 1 

Assumptions (AO) and (BO) are standard ones in kernel esti- . . . .. . , _ , , . 

„ , . , ,. , ,„„„ even though we do not use this result here see Kemark 1. . 

mation theory. Some classical considerations on MSF or M1SF , , . 



As X is independent of X and Y, we write 
E X (Z 2 ) = / Ex(m{x) 2 )f x (x)dx 



(Mean Integrated Squared Error) lead to theoretical optimal 
constant bandwidths of order n -1 / 5 . 

Assumptions (Al) and (Bl) are directly satisfied by com- 
monly used kernel functions. We can note that they require ( 

a kernel with bounded support, but this is only a technical = / (Var x (m(x)) + E x (m(x)) ) f x (x)dx. 

assumption for brevity of proofs. For example, the Gaussian 

kernel can be used Considering assumptions (A3), (A4) and (CO) we then get us- 

The assumption f x (x) > in (A2) and (B2) simply ensures in S @. and ©, in a similar way as for the standard MISE 
that the experimental design is rich enough. The fact that (A2) evaluation, 

also requires fx(-) to be bounded in a neighborhood of x is nat- r VQ r 

ural. The Lipschitz condition on / in (B2) is directly satisfied E X (ZJ = / m(x) fx(x)dx + — J a 2 {x)dx 

if / is sufficiently regular and with compact support. r 

Assumptions (A3), (A4), (B3) and (B4) are natural and / m(x)rh(x)f x (x)dx + o P {h\) 

ensure sufficient regularity to the conditional moments. 

Assumption (CO) is made to make the presentation easier. 
It can be relaxed by means of the conventional truncation tech- and b ^ the same a rguments we also have 
niques used in real cases (Mack & Silverman (1982)). Never- r 1 r 

theless in practice, the input factors considered in sensitivity E x(^i) = / m{x)f x {x)dx + -^K j m{x)f x {x)dx + o P {h 1 ), 
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which finally leads to 

Ex(fi) = Ex^^-Ex^) 2 
= Var(E(F|X)) 



where 



Vi = -H2 I (x)f x (x)dx 



where 
Mi 

and 



m{x)rh(x)fx (x)dx 

m{x)fx{x)dx S j (^J rh{x)fx(x)dx 

+ J 1 L I ° 2 {x)dx + o P {h\) 
nhi J 

Var(E(F|X)) + M Y h\ + ^ + o P (h 2 ) 

u 2 J m(x)rh{x)fx{x)dx 
- (^j m{x)f x {x)dx S j (^J m(x)f x (x)da 

M 2 = vq 



and using the same arguments 



Var x (f 2 ) = - {E(Var(Y|A) 2 
n 



+/12/J 2 . J cr 2 (x)a 2 (x)fx(x)dx 

- t i 2 hl <r 2 (x)fx(x)dx^ [J <7 2 (x)f x (x)d, 



+ 



nh 2 



(x)X 2 (x)dx 



+o P [ hi + h\ 



1 



I nhi 



= - (E(Var(F|X) 2 ) + V 2 h 2 + V 3 hj + A 
n' y nh 2 



+o P [ hi + hi 



/nhn 



j <r 2 (x)dx. 



where 



A.3 Proof of Theorem 2 



Similarly wc first recall asymptotic results for the residual-based 
estimator of the conditional variance. 

Under assumptions (B0)-(B4) Fan & Yao (1998) showed 
that 



V 2 = ^2 J C7 2 (x)a 2 (x)fx(x)dx, 
V 3 = -ii 2 a 2 (x)fx(x)dx^j a 2 (x)fx(x)dx^ 
V A = v Q j a A (x)X 2 (x)dx. 



E x (<7 2 (x)) = a 2 (x) + ^ 2 a 2 (x)h 2 2 + o P {h\ + h\) 



and 



Var x (a 2 (x)) = 



v cr A (x)\ 2 {x) 



Op 



1 



fx(x)nh 2 ' ~* Wnha, 

where \ 2 (x) = E((e 2 — 1) 2 |A = x) and \i 2 and v$ are as defined 
above. The estimator T 2 can be written as 



n 



where (Uj)j—i t ... tn i := (<7 2 (Aj))j = i,. ..,„<. As in the proof of The- 
orem 1, we then get the conditional bias and variance of T 2 : 

Ex(T 2 ) =E X (U 1 ) 

and 

Var x (f 2 ) = lvar x (*7i). 
n 

As X is independent of X and Y, we have 

E x (C/i) = / E x (a 2 (x))f x (x)dx. 

Considering assumptions (B4) and (CO) as in the proof of The- 
orem 1 we then get 

E X (T 2 ) = E(Va r (Y\X)) + ^ 2 h 2 2 J a 2 (x)f x (x)dx 

+op{h\ + h 2 2 ) 
= E(Var(r|A)) + y 1 /i 2 +op(/t 2 + /i 2 ) 
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